Reinforcement cutting-agent learning for video object segmentation

Abstract

Video object segmentation is a fundamental yet challenging task in computer vision community. In this paper, we formulate this problem as a Markov Decision Process, where agents are learned to segment object regions under a deep reinforcement learning framework.

Essentially, learning agents for segmentation is nontrivial as segmentation is a nearly continuous decision-making process, where the number of the involved agents (pixels or superpixels) and action steps from the seed (super)pixels to the whole object mask might be incredibly huge.

Methodology

To overcome this difficulty, this paper simplifies the learning of segmentation agents to the learning of a cutting-agent, which only has a limited number of action units and can converge in just a few action steps.

The basic assumption is that object segmentation mainly relies on the interaction between object regions and their context. Thus, with an optimal object (box) region and context (box) region, we can obtain the desirable segmentation mask through further inference.

Based on this assumption, we establish a novel reinforcement cutting-agent learning framework, where the cutting-agent consists of two key components:

1. Cutting-Policy Network: Learns policies for deciding optimal object-context box pair.

2. Cutting-Execution Network: Executes the cutting function based on the inferred object-context box pair.

With the collaborative interaction between the two networks, our method can achieve superior performance through reinforcement learning.

Experimental Results

Our method achieves outperforming VOS performance on two public benchmarks, which demonstrates:

• The rationality of our assumption that object segmentation mainly relies on the interaction between object regions and their context

• The effectiveness of the proposed learning framework in simplifying the complex segmentation task into a manageable cutting-agent learning problem

The results show that by formulating video object segmentation as a reinforcement learning problem with a cutting-agent, we can effectively reduce the complexity of the decision-making process while achieving superior segmentation performance.

Keywords: Deep Learning cutting-agent Reinforcement Object Computer vision Video Instance Segmentation

Reinforcement cutting-agent learning for video object segmentation

Abstract

Methodology

Experimental Results

📚 Cite This Work